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The BABAR experiment is the particle detector at the PEP-II B-factory facility at the Stanford Linear Accelerator 
Center 0. During the summer shutdown 2002 the BABAR Event Building and Level-3 trigger farm were upgraded 
from 60 Sun Ultra-5 machines and lOOMBit/s Ethernet to 50 Dual-CPU 1.4GHz Pentium-Ill systems with 
Gigabit Ethernet. Combined with an upgrade to Gigabit Ethernet on the source side and a major feature 
extraction software speedup, this pushes the performance of the BABAR event builder and L3 filter to 5.5kHz at 
current background levels, almost three times the original design rate of 2kHz. For our specific application the 
new farm provides 8.5 times the CPU power of the old system. 



1. THE BABAR DATA ACQUISITION 
SYSTEM 



The primary data handling components of the 
BABAR data acquisition system are the Online Data 
Flow (ODF) |2j and the Online Event Processing 
(OEP) ODF is the real-time system for trigger dis- 
tribution, readout of the front end electronics, feature 
extraction and multi-stage event building. OEP pro- 
vides near-real-time processing of fully built events: 
Level-3 software trigger, fast data quality monitoring, 
calibration processing and dispatch to data logging. 

Figure ^ gives an overview of the BABAR data ac- 
quisition system. 

Sampling and analog-to-digital conversion takes 
place in the front end electronics, mounted directly on 
or close to the subdetectors. From there, data is sent 
to "Read-Out-Modules" (ROMs) which are 9u VME 
modules that combine a commercial PowerPC single 
board computer running the Vx Works operating sys- 
tem with custom-built components. The number of 
VME crates and ROMs in the crate depends on de- 
sign and data load of the subdetector. The ROMs 
apply detector-specific feature extraction algorithms 
and then transfer their event contributions via the 
VME bus to the ROM in slot of their crate. This is 
the first stage of event building. The next stage is to 
combine all slot-0 ROM contributions of an event by 
sending them to the same Level-3 trigger farm node 
via Ethernet. The network event builder uses a UDP- 
based protocol, does not attempt to retransmit lost 



packets and employs a simple window-based flow con- 
trol mechanism. Events are sent to a quasi-random 
sequence of farm nodes by using a hash value based 
on the event time stamp. 

In the farm node, the Level-3 filter algorithms are 
applied on the complete event, events that pass the 
selection are sent to a central logging server where 
they are written to disk files. These files are subse- 
quently transferred to the SLAC computer center for 
permanent tape storage and near-real-time full recon- 
struction. 

The initial design goal was 2kHz of 30kByte events 
to be read out with negligible dead time from the front 
end electronics, feature extracted, built and shipped 
to the Level-3 trigger that would select 100Hz of 
events for logging and permanent storage. With the 
original farm of 32 330MHz Sun Ultra-5 machines the 
system did indeed reach 2kHz, limited by the amount 
of CPU time available to the Level-3 software trigger. 



2. PLANNING THE UPGRADE 
2.1. Reasons for Upgrading 

A farm upgrade became necessary for the following 
reasons: 

• The projected PEP-II luminosities and trigger 
rate extrapolations require the data acquisition 
system to handle higher rates and larger event 
sizes, in addition, operational experience showed 
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Figure 1: Overview of the BABAR Data Acquisition System 



that headroom of a factor of two is desirable to 
handle temporary high background conditions. 

• Having more CPU for more advanced Level-3 
trigger algorithms will facilitate to keep the log- 
ging rate low. 

• The Sun Ultra-5 machines were reaching their 
end of life, showing an increasing number of 
hardware failures. 

To be able to safely handle the projected trigger 
rates until at least 2005, it was decided that an up- 
grade should increase the farm CPU capacity by a 
factor of about 10 and that the 1 00MBit /s Ethernet 
connections in the network event builder should be 
upgraded to Gigabit Ethernet. 



2.2. Requirements and Upgrade Options 

Any upgrade option had to meet the following basic 
requirements: 

• Run one of the two operating systems supported 
as BABAR software platforms: RedHat Linux on 
i386 or Solaris/SPARC 

• Hardware must be available during the planning 
phase (fall 2001) so that performance measure- 
ments could be obtained and to prevent product 
availability from impacting the rigid deployment 
schedule during the Summer 2002 shutdown. 



Some options had been ruled out a priori, because 
they were expected to be too different from the exist- 
ing system and would have required extensive R&D 
and testing efforts, for example multiprocessor sys- 
tems with more than 2 CPUs where the concern were 
symmetric multiprocessing scaling and interrupt dis- 
tribution issues. 

In the end, the following major options were con- 
sidered: 

1. Rack-mountable Sun 750MHz UltraSPARC-Ill 
dual-CPU systems (4 rack units high) 

2. Replace Sun 330MHz Ultra-5 machines with 
rack-mountable 1-CPU 440MHz Sun machines 
(1 rack unit high), add systems as necessary and 
replace the farm later when faster and cheaper 
SPARC single or dual CPU systems became 
available 

3. Intel 1.2Ghz Pentium-Ill Dual-CPU systems 
with Linux (1 rack unit high) 

Both Sun options (f^and[2J) had the advantage that 
no software modifications to ODF were required and 
that they could just be used as a drop-in replacement. 
However, compared to the Intel option @, they were 
significantly more expensive and had much larger rack 
space requirements: For comparable CPU power, op- 
tionfljnceded more rack space because of slower CPUs 
and a larger form factor, option [3 just because of the 
much slower CPUs. 



MOGT003 



CHEP2003, La Jolla, California, March 24-28, 2003 



3 



The Intel option had the major disadvantage to re- 
quire modifications to ODF and OEP software in or- 
der to accommodate the different byte ordering. Op- 
tion |2 had the additional disadvantage to be a two- 
step upgrade where the second step relied on not yet 
existing hardware. 

After extensive consideration of these options, the 
Intel Linux option was chosen because it was expected 
to be the most cost-effective and long-term maintain- 
able solution. 



3. ADAPTING THE SOFTWARE 
3.1 . Data Flow Software 

ODF hardware and software is responsible for real- 
time readout, data transport, feature extraction and 
multi-stage event building. The last stage is a network 
event builder implemented on top of UDP, using no 
retransmit and a simple end-to-end flow control pro- 
tocol. The underlying link- level protocol is Ethernet. 
To maximize system performance, the original design 
required the machines on the receiving end of the net- 
work event builder to have the same byte ordering as 
the PowerPC embedded systems on the source side. 
For the network event builder, the event data payload 
is transparent binary data. 

In order to accommodate Intel/Linux farm ma- 
chines, ODF software had to be modified. To avoid 
wasting precious CPU time on the embedded sys- 
tems, all byte swapping occurs on the farm machines. 
The ODF network protocol was changed so that all 
datagram headers are 32-bit aligned and swappable 
and whole datagrams (ODF headers as well as event 
data payload) are 32-bit byte swapped after they have 
been received by the farm machines. ODF software 
then builds the event and delivers the complete pre- 
swapped event data to OEP. 

Since the ODF network protocol always uses the 
source byte order, it is independent of the byte or- 
der of the farm machines and allows the use of Sun 
or Linux machines, in principle even in a mixed farm. 
During testing we took advantage of this very impor- 
tant feature. 

In parallel to upgrading the farm to Gigabit Ether- 
net, the embedded systems were upgraded to Gigabit 
Ethernet. To minimize the overhead in CPU and wall 
clock time for sending a packet, a custom minimal 
UDP protocol stack and network device driver was 
developed. This driver sends a 1500 byte packet in 
17.3/is, compared to the 66/iS of the VxWorks driver 
that came with the Gigabit Ethernet interface card 
and the 100/iS needed by the 100MBit ethernet driver 
used before the upgrade. 



3.2. Online Event Processing and 
Level-3 Filter Software 

OEP and Levcl-3 filter obviously need to look at 
the event data. While all navigational information is 
32-bit wide and aligned on 32-bit boundaries, individ- 
ual containers can hold 8-bit, 16-bit and 32-bit quan- 
tities. As mentioned above, events are delivered to 
OEP pre-swapped, that means that the navigational 
information is already intact. The classes that de- 
scribe the data format then know how to fix up 8-bit 
and 16-bit quantities, this process only happens on 
demand. The on-disk representation of data is always 
big-endian, so on little endian machines the reverse 
process is applied before data is written out to disk or 
sent over the network. 

3.3. System Software and System 
Management 

System management for Linux machines in the 
BABAR online computing infrastructure was achieved 
by modifying the BABAR-online-specific version of 
Taylor to support RedHat Linux |5j] and to set up 
a Kickstart configuration for network installation of 
Linux machines. Taylor is a system post-install and 
configuration language developed at SLAC 0. 



4. TESTING AND HARDWARE 
ACQUISITION 

4.1. Choosing Hardware 

Since time and personnel resources for testing were 
limited, the initial choice of hardware was made, con- 
sidering experience from the SLAC windows group 
who was already operating a number of Dell Pow- 
erEdge 1550 1 rack-unit, 1.2GHz dual-CPU servers 
with Intel E1000 Gigabit Ethernet cards. We decided 
to use fiber-optic Gigabit Ethernet because at deci- 
sion time there was no significant experience at SLAC 
with the cheaper copper cable Gigabit Ethernet. 

A small number of machines was acquired and used 
in different testing modes: 

• Teststand testing during software development 
was used to develop the software, set up the sys- 
tem software infrastructure and verify the basic 
suitability of the machines for the task. 

• "Parasitic" testing in the real system was used 
to find problems with system stability or other 
operational issues. The connection-less and byte 
order independent event builder protocol al- 
lowed to use the port monitoring feature of the 
event building switch to send real-time copies of 
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datagrams to a small number of machines be- 
ing tested. In this mode, the switch was config- 
ured to send copies of all packets addressed to a 
Sun farm node to the interface of a Linux node 
that had been set up with the same MAC and 
IP addresses as the original node but had been 
blocked from sending any replies back. 

To ensure proper operation, two machines were 
run in this mode for several months, no problems 
were found. 

• Small-scale testing of up to six new machines in 
the real system. 

By the time of the purchase decision, the tested 
hardware was no longer available and wc had to 
go through another testing cycle with the successor 
model (Dell PowerEdge 1650). At that time it was 
decided to use 1.4GHz systems instead of 1.2 GHz. 

Compared to the 330MHz UltraSPARC-II baseline, 
our application runs 2.75 times as fast on a 1.4GHz 
Pentium-Ill CPU. Because of limitations in the num- 
ber of Gigabit Ethernet switch ports and cost issues, 
we decided to purchase 50 new machines in the final 
configuration: 

• Dual-Pentium-III, 1.4 GHz 

• 1GByte RAM 

• optical Intel El 000 Gigabit Ethernet card 

• single power supply 

• no remote lights-out management capabilities 
(during data taking there are always operators 
in the control room who can manually power 
cycle a machine if necessary) 

Compared to the old Sun farm the new farm pro- 
vides 8. 5 times the CPU power. 

5. DEPLOYMENT AND RESULTS 

During a 4-month shutdown in summer 2002, the 50 
new machines were installed in 3 water-cooled racks in 
the BABAR electronics hut. Since the Dell rack rails 
could not easily be made fit into the existing racks, 
pairs of machines were stacked on regular rack shelves. 
The electronics hut is accessible during data taking, so 
it was decided to save the cost of a remote power con- 
trol or reset system and simply install serial lines to 
the console ports. New modules with a total of 80 Gi- 
gabit Ethernet ports were installed in the Cisco 6509 



event building switch. To satisfy subdetector data ac- 
quisition needs during the shutdown, a part of the old 
farm was kept in place and operational. Installation 
and startup of the new farm was very smooth, one 
machine was broken on arrival and another machine 
failed after one week of operation. 

We did not encounter other major problems and 
since the start of BABAR Run 3 in November 2002 the 
new system is used in production. After additional 
improvements to the feature extraction code the sys- 
tem is now capable of handling 5.5kHz Level- 1 trigger 
rate at 30kByte event size and nominal background 
rates, limited by the feature extraction code in the 
ROMs. 

We did not observe any stability problems with the 
Linux farm machines. 



6. CONCLUSIONS 

The upgrade of the BABAR data acquisition system 
to Linux and Gigabit Ethernet has been successful 
and improved the maximum rate capability at nomi- 
nal backgrounds to 5.5kHz. 
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